A Proof-of-Concept of D Record Mining using Domain-Dependent Data

نویسندگان

  • Yeong Su Lee
  • Michaela Geierhos
  • Sa-Kwang Song
  • Hanmin Jung
چکیده

Our purpose is to perform data record extraction from online event calendars exploiting sublanguage and domain characteristics. We therefore use so-called domain-dependent data (D) completely based on language-specific key expressions and HTML patterns to recognize every single event given on the investigated web page. One of the most remarkable advantages of our method is that it does not require any additional classification steps based on machine learning algorithms or keyword extraction methods; it is a so-called one-step mining technique. Moreover, another important criteria is that our system is robust to DOM and layout modifications made by web designers. Thus, preliminary experimental results are provided to demonstrate proof-of-concept of such an approach tested on websites in the German opera domain. Furthermore, we could show that our proposed technique outperforms other data record mining applications run on event sites.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Concept drift detection in business process logs using deep learning

Process mining provides a bridge between process modeling and analysis on the one hand and data mining on the other hand. Process mining aims at discovering, monitoring, and improving real processes by extracting knowledge from event logs. However, as most business processes change over time (e.g. the effects of new legislation, seasonal effects and etc.), traditional process mining techniques ...

متن کامل

Prediction of mineral deposit model and identification of mineralization trend in depth using frequency domain of surface geochemical data in Dalli Cu-Au porphyry deposit

In this research work, the frequency domain (FD) of surface geochemical data was analyzed to decompose the complex geochemical patterns related to different depths of the mineral deposit. In order to predict the variation in mineralization in the depth and identify the deep geochemical anomalies and blind mineralization using the surface geochemical data for the Dalli Cu-Au porphyry deposit, a ...

متن کامل

Prediction of dispersed mineralization zone in depth using frequency domain of surface geochemical data

Discrimination of the blind and dispersed mineralization deposits is a challenging problem in geochemical exploration. The frequency domain (FD) of the surface geochemical data can solve this important issue. This new exploratory information can be achieved using the interpretation of FD of geochemical data, which is impossible in spatial domain. In this research work, FD of the surface geochem...

متن کامل

Calculation of One-dimensional Forward Modelling of Helicopter-borne Electromagnetic Data and a Sensitivity Matrix Using Fast Hankel Transforms

The helicopter-borne electromagnetic (HEM) frequency-domain exploration method is an airborne electromagnetic (AEM) technique that is widely used for vast and rough areas for resistivity imaging. The vast amount of digitized data flowing from the HEM method requires an efficient and accurate inversion algorithm. Generally, the inverse modelling of HEM data in the first step requires a precise a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012